15 research outputs found
Visual Attention in Dynamic Environments and its Application to Playing Online Games
Abstract In this thesis we present a prototype of Cognitive Programs (CPs) - an executive controller built on top of Selective Tuning (ST) model of attention. CPs enable top-down control of visual system and interaction between the low-level vision and higher-level task demands.
Abstract We implement a subset of CPs for playing online video games in real time using only visual input. Two commercial closed-source games - Canabalt and Robot Unicorn Attack - are used for evaluation. Their simple gameplay and minimal controls put the emphasis on reaction speed and attention over planning.
Abstract Our implementation of Cognitive Programs plays both games at human expert level, which experimentally proves the validity of the concept. Additionally we resolved multiple theoretical and engineering issues, e.g. extending the CPs to dynamic environments, finding suitable data structures for describing the task and information flow within the network and determining the correct timing for each process
Rapid Visual Categorization is not Guided by Early Salience-Based Selection
The current dominant visual processing paradigm in both human and machine
research is the feedforward, layered hierarchy of neural-like processing
elements. Within this paradigm, visual saliency is seen by many to have a
specific role, namely that of early selection. Early selection is thought to
enable very fast visual performance by limiting processing to only the most
salient candidate portions of an image. This strategy has led to a plethora of
saliency algorithms that have indeed improved processing time efficiency in
machine algorithms, which in turn have strengthened the suggestion that human
vision also employs a similar early selection strategy. However, at least one
set of critical tests of this idea has never been performed with respect to the
role of early selection in human vision. How would the best of the current
saliency models perform on the stimuli used by experimentalists who first
provided evidence for this visual processing paradigm? Would the algorithms
really provide correct candidate sub-images to enable fast categorization on
those same images? Do humans really need this early selection for their
impressive performance? Here, we report on a new series of tests of these
questions whose results suggest that it is quite unlikely that such an early
selection process has any role in human rapid visual categorization.Comment: 22 pages, 9 figure
Agreeing to Cross: How Drivers and Pedestrians Communicate
The contribution of this paper is twofold. The first is a novel dataset for
studying behaviors of traffic participants while crossing. Our dataset contains
more than 650 samples of pedestrian behaviors in various street configurations
and weather conditions. These examples were selected from approx. 240 hours of
driving in the city, suburban and urban roads. The second contribution is an
analysis of our data from the point of view of joint attention. We identify
what types of non-verbal communication cues road users use at the point of
crossing, their responses, and under what circumstances the crossing event
takes place. It was found that in more than 90% of the cases pedestrians gaze
at the approaching cars prior to crossing in non-signalized crosswalks. The
crossing action, however, depends on additional factors such as time to
collision (TTC), explicit driver's reaction or structure of the crosswalk.Comment: 6 pages, 6 figure
Understanding and Modeling the Effects of Task and Context on Drivers' Gaze Allocation
Understanding what drivers look at is important for many applications,
including driver training, monitoring, and assistance, as well as self-driving.
Traditionally, factors affecting human visual attention have been divided into
bottom-up (involuntary attraction to salient regions) and top-down (task- and
context-driven). Although both play a role in drivers' gaze allocation, most of
the existing modeling approaches apply techniques developed for bottom-up
saliency and do not consider task and context influences explicitly. Likewise,
common driving attention benchmarks lack relevant task and context annotations.
Therefore, to enable analysis and modeling of these factors for drivers' gaze
prediction, we propose the following: 1) address some shortcomings of the
popular DR(eye)VE dataset and extend it with per-frame annotations for driving
task and context; 2) benchmark a number of baseline and SOTA models for
saliency and driver gaze prediction and analyze them w.r.t. the new
annotations; and finally, 3) a novel model that modulates drivers' gaze
prediction with explicit action and context information, and as a result
significantly improves SOTA performance on DR(eye)VE overall (by 24\% KLD and
89\% NSS) and on a subset of action and safety-critical intersection scenarios
(by 10--30\% KLD). Extended annotations, code for model and evaluation will be
made publicly available.Comment: 12 pages, 8 figures, 8 table
A Focus on Selection for Fixation
A computational explanation of how visual attention, interpretation of visual stimuli, and eye movements combine to produce visual behavior, seems elusive. Here, we focus on one component: how selection is accomplished for the next fixation. The popularity of saliency map models drives the inference that this is solved, but we argue otherwise. We provide arguments that a cluster of complementary, conspicuity representations drive selection, modulated by task goals and history, leading to a hybrid process that encompasses early and late attentional selection. This design is also constrained by the architectural characteristics of the visual processing pathways. These elements combine into a new strategy for computing fixation targets and a first simulation of its performance is presented. A sample video of this performance can be found by clicking on the "Supplementary Files" link under the "Article Tools" heading